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Abstract 



This paper introduces an iterative scheme for acoustic model inversion where the notion of 
proximity of two traces is not the usual least-squares distance, but instead involves registration 
OO as in image processing. Observed data are matched to predicted waveforms via piecewise- 
^"^ polynomial warpings, obtained by solving a nonconvex optimization problem in a multiscale 
^^ fashion from low to high frequencies. This multiscale process requires defining low- frequency 
r ) augmented signals in order to seed the frequency sweep at zero frequency. Custom adjoint 
/^ sources are then defined from the warped waveforms. The proposed velocity updates are ob- 
^^ tained as the migration of these adjoint sources, and cannot be interpreted as the negative 
F^ gradient of any given objective function. The new method, referred to as RGLS, is success- 
es fully applied to a few scenarios of model velocity estimation in the transmission setting. We 
fl show that the new method can converge to the correct model in situations where conventional 
I— I least-squares inversion suffers from cycle-skipping and converges to a spurious model. 
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vn 1 Introduction 

^ 

^^ Waveform inversion via non-linear least-squares (LS) minimization (Tarantola and Valette, 1982) is 
CO 



effective when the starting model is accurate (Virieux and Operto, 2009), but otherwise suffers from 



stalled convergence to spurious local minima due to the oscillatory nature of the data. Traveltime 



• i-H tomography is typically used to generate an accurate initial model (Prieux et al., 2012; Pratt 

r> and Goulty, 1991; Bregman et al., 1989), which is then improved upon by waveform inversion. 



^ Frequency sweeps in full waveform inversion (FWI) (|Pratt| |1999| ) are another well-documented 

way to encourage convergence toward the global least-squares minimum, by fitting data from low 
to high frequencies. However, the lack of low- frequency data, or their corruption by noise, often 
hinders this frequency sweep approach. This paper proposes a method which succeeds at recovering 
the model velocity in some transmission scenarios, without traveltime picking, and even when the 
data only contain high frequencies. 

We propose a simple modification to least-squares-based waveform inversion where the residual 
of the adjoint-state equation, normally the difference d — u between the observed data d and 
predicted data u^ is replaced by a more geometrically meaningful quantity. We propose to let 
this more general residual go by the name adjoint source^ and form it by replacing the observed 
data d with a transported, or warped, version d of the predicted data toward the observed data. 
The rationale for this substitution is that the phases of d are in general off by more than one 



wavelength in comparison to those of u. In contrast, the new fractionally warped data d are defined 
so that their phases match those of the prediction u to within a small fraction of a wavelength. 
When the registration algorithm generating the warping succeeds, this method yields model velocity 
updates that properly fix time delays or advances without suffering from cycle skipping — although 
these velocity updates are no longer gradient descent steps. We observe that least-squares misfit 
optimization with this modified adjoint source can have a much enlarged basin of attraction. The 
method is referred to as registration- guided least-squares (RGLS) inversion. 

To find the best warping between an observed trace d and the corresponding predicted trace 
i/, we formulate and solve an optimization problem which, not unlike FWI, is itself nonconvex. 
The highly oscillatory nature of the traces is also what makes seismogram registration nontrivial. 
However, we show that the nonconvexity is tractable and can be handled by a continuation strategy, 
where the match is realized scale-by-scale in a careful, iterative fashion. The traces d and u 
usually do not contain useful low frequencies in exploration seismology, so the seeding problem 
of this multiscale iteration is as much an issue here as in classical FWI. We propose to solve 
this problem by introducing nonlinearly transformed signals which, by construction, contain low- 
frequency components. We refer to these convenient, nonphysical nonlinearly-transformed signals 
as low-frequency augmented (LFA) signals. The LFA transformation can be thought of as an ad-hoc 
pre-processing of the traces so as to create low frequencies, yet maintain much of the information at 
high frequencies. Subsequently, seismogram registration is realized through the match of the LFA 
of d to the LFA of ix by a warping function of limited complexity, such as a piecewise polynomial. 

Matching observed data and predicted data, or matching time-lapse images has already been 



used in many different applications under many different names: phase-shift (Tromp et al. , 2005), 



traveltime delay based on correlation (Luo and Schuster, 1991) image registration using optimal 



transport (Haker and Tannenbaum, 2001), curve registration (Ramsay and Li, 1998), registration 



using local similarities (Fomel and Jin, 2009; Fomel and van der Baan, 2010), and dynamic time 



warping for speech pattern matching (Sakoe and Chiba, 1978). Anderson and Gaby (1983) ap 



plied the dynamic waveform matching method to geophysics problems such as earthquake arrival 



time identification and waveform clustering. Hale (2013) applied the dynamic warping, which was 



originally developed for speech recognition, to analysis of time-lapsed seismic images. Magg i et al. 
( j2009, ) developed an algorithm to select time windows to extract time-shift between seismograms 
with iterative tomographic inversion in mind. Liner and Clapp (2004) studied alignment of seismic 



traces using dynamic programming for trace processing and interpretation. [Kennett and Fichtner 
(2012) introduced a transfer operator which maps seismograms similar to our polynomial mapping 



as a way to generalize comparison between traces. 

It is important to point out that the "fractional" character of the warping used to generate the 



adjoint source in this paper is in the same spirit as a solution proposed in Sava and Biondi (2004), 
where image perturbations in the image domain are replaced by their linearized version to miti- 
gate lack- of- convexity issues beyond the Born approximation. Similar suggestions of non-gradient 
updates generated from residuals involving 'infinitesimally modified" images are also proposed in 



recent work by Fei and Williamson (2010) and Albertin (2011). 



To the best of our knowledge, no studies have yet proposed to modify the adjoint source by 
replacing observed data with time- warped predicted data in order to enlarge the basin of attraction 
of FWI. Moreover, we hope to illustrate the benefits of considering piecewise polynomials to define 
mappings between two images or traces. Finally, the idea of nonlinearly transforming traces to 
generate low frequencies for seismogram registration seems new to seismic imaging. 

The paper is organized as follows. We start by explaining the motivation behind modifying 
the adjoint source to the adjoint state equation. We then detail and compare different trace 
registration methods. Seismogram registration at the trace level is demonstrated with synthetic 



noisy and noiseless data. The RGLS inversion method is then tested in several transmission cases, 
with velocity models that include low and high velocity zones. We show that the LS and RGLS 
methods behave significantly differently. We finish by discussing the limits of the proposed method. 
In a nutshell, seismogram registration requires comparable traces, which explains why we consider 
transmission rather than reflection examples in this paper. 

2 Guided least-squares with a modified adjoint source 

Full waveform inversion (FWI), in its standard form, tries to minimize the least-squares misfit 

^H^2^/ \^s(^r^t) -ds{Xr,t)\'^dt, (1) 

s,r -^ 

where m{x) denotes a velocity model, Ug — J-'s[m]i and dg are predicted and observed data at a shot 
s and at a receiver x^, respectively. For notational simplicity, the subscripts s and r are omitted 
whenever it does not cause confusion. In this paper the forward operator J-'s[m] maps a velocity 
model m to data Us(xr^t) through the acoustic wave equation, 

u 

where fs{x^t) is a source term. The adjoint-state method generates the gradient of J[m\ via the 
imaging condition 

— [m] = J q^(x,t)^{x,t)dt, (2) 

where the adjoint field Qs solves the adjoint wave equation backward in time with the data residual 
ds — Us in the right-hand side, and in the medium m. 

It is well known that the nonconvexity of J[m] is particularly pronounced way when the data 
are oscillatory. More specifically, when the time difference between corresponding arrivals in Ug 
and dg is larger than a half wavelength, the steepest descent direction of the data misfit may result 
in increasing those time differences, consequently increasing the model error. In order to guide 
the optimization in the direction of the correct model update, we propose to change the residual 
dg — Us in the adjoint wave equation by replacing dg by a version of Ug transported "part of the 
way" toward dg. We denote these virtual, transported data by dg and refer to them as fractionally 
warped data. Their construction is in the next section. The rationale behind dg is that its arrivals 
can now be less than a quarter of a wavelength apart from those in Ug. The substitution of dg by 
dg is illustrated in Figure [l| 

In order to generate a good candidate of fractionally warped data we propose to find piece- 
wise cubic polynomials p{t) and A{t) so as to have a good match d{t) ^ A{t)u{p{t))^ then define 
fractionally warped data as 

d{t) = [A{t)r u{{l - a)t + ap{t)) (3) 

with some very small < o^ <C 1. The warping p(t), the amplitude A(t), and the value of a are 
chosen so that fractionally warped data d(t) have a similar shape to that of the prediction u{t) 
but phase discrepancies smaller than a quarter of a wave period. In order to find p{t) and A(t)^ 
we propose a non-convex optimization scheme similar to image registration. The proposed FWI 
method therefore transfers all non-convexity to the registration problem at the trace level. We 
describe seismogram registration in detail in the next section. 
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Figure 1: Replacement of observed data with fractionally warped data, which are a mapped 
(transported slightly) version of the given predicted data towards the observed data. We call the 
new traces fractionally warped data. 



3 Seismogram registration 

3.1 Statement of the optimization problem 

In order to find A(t) and p(t) we propose to solve the following least-squares minimization problem 
for each trace: find p(t) and A{t) piecewise cubics that minimize 



W\p, A] = IJ \d{t) - A{t)u{pm^dt +IJ \p{t) - t\^dt. 



(4) 



The letter W stands for "warping". The functions A(t) and p{t) are realized as cubic Hermite 
splines over 4 fixed subintervals in the following examples, unless otherwise stated. (This paper 
does not address the problem where the spline nodes could also be determined by optimization.) 

Due to the oscillatory nature of the predicted and observed data, this registration problem is 
non-convex and suffers from the same cycle-skipping phenomenon as conventional least-squares 
FWI does. Simulated annealing or other Monte Carlo methods for global optimization could be 
performed but they are not tried in this paper. Instead, the minimization is carried out in a 
multiscale fashion by restricting the data d(t) and its prediction A{t)u(p(t)) to a slowly growing 
subset of frequencies, from the DC component to successively higher frequencies, as in frequency 
domain FWI (Plessix, 2009). However, the observed data d usually have small energy in the low 
frequency band and may be corrupted by noise. In order to start the sweep at zero frequency, 
we use modified traces D{t) and U{t), manufactured to contain low frequencies, instead of d{t) 
and u(t). We call D(t) and U{t) low-frequency augmented (LFA) signals. Hence, the optimization 
problem Q becomes: find p{t) and A(t) piecewise cubics that minimize 



W^faIp, M-\j \D{i) - Amipim^dt +^J \pit) - t\^dt. 



(5) 



Here are three reasonable possibilities for defining an LFA signal, U{t), from a band-limited 
signal u{t): 



Uh = u{t) + \u{t) + i{nu){t)\, 
Us = u\t), 

Ua^\u{t)\. 



(6a) 
(6b) 
(6c) 



We use / for the identity map and H for the Hilbert transform, defined in the frequency domain 
as T-Lu{cjj) = —isgn{(jj)u{cjj) (Benedetto, 1997), where sgn(-) is the sign function. 

The Hilbert transform completes any real signal with an imaginary part, so that u + iT-Lu is 
an "analytic" signal in the sense of having no negative frequency component. The amplitude 
y/u'^{t) + {l-Lu{t)Y has the interpretation of an envelope for the signal u + il-Lu. The Hilbert 
transform is a classical tool in signal processing; it is typically used for demodulation in seismic 
inversion (iBozda et al.l 120111). In a later section we argue that Uh is a particularly good LFA 



transformation. 

The piecewise cubic polynomials p{t) and A{t) can be written as 



p{t) = ^pkMt) = [Mt) Mt)---Mt)][po pi ... pn] 



k=0 



and 



A{t) = ^ak(l)k{t) = [(t)o{t) (f)i{t)...(f)n{t)][ao ai ... anf , 



k=0 



respectively, for a set of basis functions (/)/e(t), fc = 0, 1, 2, ..., n. The column vectors [po, pi, ..., p^]^ 
and [ao^ai^ ...,an]^ are denoted by p and a, respectively. Their components are the 2(n + 1) 
parameters to be determined per trace. 

The gradient and the Hessian matrix of VFlfa with respect to p and a can be found analytically, 
e.g., 

^^ = J[D-AU{p)][-AU\p)^,]-^ip-t)Mt. (7) 

(H^),, = |^ = J[[AU\p)]'-[D-AU{p)][AU'{p)] + l]^,^,dt. (8) 

Similar expressions can be derived for the gradient and Hessian with respect to the coefficients of 
A{t). The integrals are computed approximately using the trapezoidal rule as a quadrature. 

3.2 Optimization strategy and algorithm for seismogram registration 

We detail a way to resolve the non-convexity issue of the optimization problem Q or (15]) for 
seismogram registration in Algorithm 1. 

The collection of discrete frequencies from to Ui is denoted by Qi = [0, uji]. We create M such 
sets, rii,ri2, ...^^M^ where rii C 1^2 C ... C ^m and cji < 6^2 < ... < ujm- Below, LPFj^(-) denotes 
the application of a low-pass filter with passband ftk — [0,(^k]- At the k^^ outer iteration step, 
both LFA traces D and U are low-pass filtered to the frequency range u ^ Qk^ resulting in the LF 
signals Dj^ and Uk- We let VFLFA,k foi" the expression of VFlfa with D^ and Uk in place of D and 
U. 

Algorithm 1 Seismogram registration. 

input: traces u{t) and d{t) 
initialize: p{t) = t, A(t) — 1 

LFA: D{t) ^ LFA{d{t)), U{t) ^ LFA{u{t)) 
for k^ 1,2,...,M do 

filter: Dk{t) ^ LPFj,{D{t)), Uk{t) ^ LPFj,{U{t)) 

while not converged do 

Compute: — q ' , — ^^ ' , and the Hessians Hp, Hc^ of the functional VFLFA,k. 

Newton step: p^p-H'^^^, a^a-H'^^^^ 
end while 
end for 
output: p{t)^ A{t) 

The maximum frequency ujm in the outer loop is set to a frequency below the central frequency 
of the source signature used to generate data. Numerical experiments show that sweeping up 
to half of the central frequency suffices for convergence. Most kinematic discrepancies between 
the observed data and the predicted data are fixed after sweeping up to 5 Hz, where central 
frequencies of the Ricker wavelets are above 15 Hz up to 50 Hz. Since registering every trace is 
time-consuming, registration is only performed once every 50 traces. The mappings for the other 
traces are interpolated. 
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Figure 2: Example 1: A synthetic trace is generated using the Marmousi velocity model and the 
other trace is created by warping with a known mapping. 



3.3 Examples of registration of synthetic traces 



Here, we demonstrate the registration capability of the non-convex formulation (6a). Our first 



example shows the registration of two noiseless synthetic traces containing many reflected waves. 
One of the two traces is obtained from a numerical experiment with the Marmousi velocity model. 
The other trace is the result of applying a warping map p : t \-^ t + 0.15 exp (—8{t/Tc — 1) ), 
where Tc is half the recording time. Unless otherwise stated, registration is performed from zero 
frequency with the LFA signal Uh in (6a). Figure [2] (bottom) shows a good registration match. 

For the second example, we test seismogram registration with noisy synthetic data. A syn- 
thetic trace is obtained from a numerical experiment with the Marmousi model and is used as 
reference data. The trace is then transported using the same function as used in example 1 in 
order to generate predicted data. Two independent realizations of Gaussian white noise with mean 
and standard deviation 0.075 are respectively added to the two traces. Figure [3] shows excellent 
registration results in the presence of strong noise. 

The third example uses two traces obtained from numerical experiments with two distinct 
velocity models. An observed trace is obtained from the Marmousi velocity model VMarmousi{x^ z) 
while we use a different velocity model Vpred{x^ z) — VMarmousi{x, z) —0.152: for the predicted trace. 
Due to the reduction in velocity, the predicted data lag behind the observed data up to a few wave 
periods in the coda. A good, though not perfect agreement is observed between the observed trace 
and the transported trace as shown in Figure [4| 
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Figure 3: Example 2: matching noisy synthetic traces. A synthetic trace is generated using the 
Marmousi velocity model and the other trace is created by applying a known mapping. The black 
arrows in the top figure connect corresponding peaks. 
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Figure 4: Example 3: matching traces with different amphtudes and phases from the Marmousi 
model and a modified Marmousi model: (top) given observed and predicted data, (bottom) two 
traces after registration. The black arrows indicate corresponding waves. The predicted trace is 
transported towards the observed trace and its amplitude is decreased. 



3.4 Low frequency augmentation 



The examples shown in the previous section use the LFA transformation Uh in (6a), which exploits 



the Hilbert transform to create low- frequency content. In this section, we give the rationale for our 



preference for Uh over both Ug and Ua defined by (6b) and (6c) respectively. In order to compare 



them, we look into the spectra of those LFA signals, the registration errors, and convergence of the 
optimization. 

We examine the effect of the different LFA transformations on the frequency content of a Ricker 
wavelet with the central frequency 15 Hz. The envelope signal Ue{t) = \u{t) + iT-L[u{t)]\ has strong 
energy in the frequency band around zero. However, the spectrum of the envelope signal Ue loses 
frequency content above the peak frequency, as shown in Figure [s] (c). The sum of the two signals u 
and Ue should have strong energy in both frequency bands. Our seismogram registration examples 
show that there is a definite improvement when using Uh{t) = u{t) + Ue{t) rather than simply 
Ue{t) for the LFA. Squaring an oscillatory signal of mean zero generates low- frequency components 
around zero Hz. However, weaker frequency components around the center frequency 15 Hz are 
observed; this is the consequence of convolution in the frequency domain. Similar patterns of low- 
frequency augmentation and weakened signals near the central frequency are found in the spectra 



of LFA signals obtained from LFA transformation (6c). Hence, the LFA transformation (6a) is 
preferred over the other two LFA transformations. 



We compare the registration results obtained from the three LFA transformations (6a), (|6b). 



and ( |6c| ). As a reference, the original and the transported traces using ( |6b[ ) are shown in Figure 
(top). Registration using Ua = \u\ and Us = u^ yield similar results, qualitatively and performance- 
wise. The registration errors for all three LFA are plotted in Figure |6] (bottom). In our tests, the 
Hilbert-based LFA transformation enjoys the smallest registration errors among the three methods. 
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Figure 5: Spectra of signals transformed via LFA transformations (|6a|)-(6c). (a) u{t) and its 
envelope Ue{t) = \u{t) + iT-L[u{t)]\, (b) Comparison of a wavelet u{t) and its LFA transformation 
Uh{t) obtained using the Hilbert transform ([6a]), (c) Comparison of spectra of the wavelet u{t) 



and of the envelope signal Ue{t)^ (d) Comparison of spectra of the wavelet u{t) and of the LFA 
signal Uh{t)^ (e) Comparison of spectra of the wavelet u{t) and of the LFA signal Us{t) = u^{t)^ (f) 
Comparison of spectra of the wavelet u{t) and of the LFA signal Ua(t) = |'^(^)|- The blue solid (red 
dashed) lines in (c)-(f) respectively correspond to spectra of the wavelet u{t) (of the LFA signals). 
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Figure 6: Matching synthetic data from the Marmousi model. The top figure shows the registration 
results using the LFA transform (6b). The bottom figure compares the registration errors among 
the three LFA signals [7/^, Ug^ and Ua- The black sohd (red marked, blue dashed) line in the 
bottom figures respectively corresponds to registration errors using the Hilbert transform (squaring, 
absolute value). 



We also compare the decay of the registration errors among the three LFA transformations as 
registration progresses from low to high frequencies. We observe numerically that all three LFA 
transformations enable seismogram registration to converge with a reduction of the data misfit by 
two orders of magnitude. The data misfits, i.e. the registration errors in the L^ norm, for the two 
LFA transformations (6b) and (6c) are slightly larger than those for (6a), as plotted in Figure [tI 

All three LFA transformations generate strong low- frequency signals, enabling the frequency 
sweep from zero frequency. However, we claim that Uh is the most appropriate among the three 
proposed LFA signals in terms of frequency content, convergence of frequency sweeping, and reg- 
istration errors. 



4 Numerical examples of full waveform inversion 

In this section, we demonstrate the potential of registration-guided least-squares (RGLS) optimiza- 
tion for waveform inversion in transmission settings with synthetic velocity models. Conventional 
LS optimization converges to a wrong velocity model in three examples with different source-receiver 
configurations. On the contrary, the RGLS method can decrease the model root-mean-square (rms) 
error by a few orders of magnitude, in spite of temporary increases in data misfit. We presume that 
the lack of monotonic decrease of the least-squares data misfit is in fact necessary for convergence 
to the true model. 
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Figure 7: Comparison of convergence among seismogram registration using three LFA signals: Uh, 
Us, and Ua- The black solid (red marked, blue dashed) line corresponds to L^ data misfit using Uh 
{Us, Ua), respectively. 

4.1 Numerical methods and experimental setup 

The acoustic scalar wave equation is used to model wave propagation in unit density media with 
heterogeneous velocity distributions. The equation is discretized with a 4^^ order accuracy finite 
difference scheme in space. The grid size is 501 by 501 with a distance of 5 m between grid points 
along both directions. Perfectly matched layers (PML) surround the computational domain. For 
the time discretization, the explicit 2^^ order leap-frog scheme is used. A Ricker wavelet with 
the center frequency 50 Hz is used as an acoustic source. This source signature is assumed to be 
known during inversion although it should be estimated in practice. The steepest descent method 
is used in order to update the velocity model; the gradient is computed from the usual adjoint- 
state imaging condition involving the forward incident wavefield and the backward adjoint field. 
For RGLS optimization, 100 to 150 iterations suffice to significantly reduce the model rms errors. 
Conventional LS optimization in many cases fails at reducing model rms errors altogether. (We 
still report the result of LS optimization after a similar number of iterations.) 



Example 


Case 


Reference model 


Initial model 


Data type 


Example 1 


HI 
LI 


Vh 
Vl 


5100 m/s 
6000 m/s 


crosshole 
crosshole 


Example 2 


H2 
L2 


Vh 

Vl 


5100 m/s 
6000 m/s 


nearly-complete 
nearly-complete 


Example 3 


R3 


Vr + Noise 


5100 m/s 


nearly-complete 



Table 1: Velocity models and data types for inversion examples. Reference velocity models Vh, 
Vl, and Vr are Vh{x,z) = 5200 + 900exp(-|(x,2) - (1250, 1250)|Vl06), Vl{x,z) = 5500- 
900exp(-|(x, z) - (1250, 1250)|VlO'^), and Vr{x, z) = 5000 -F 900 exp(-|(x, z) - (1250, 1250)|VlO*^), 
respectively. 
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The 2D velocity models with a high or low-velocity zone at the center are used in the follow- 
ing inversion examples as listed in Table [l] The velocity model in the third example contains 
noise generated by convolution of a Gaussian kernel with an array of normally distributed random 
numbers. The initial models are constant at Vinit{x,z) = 5100 or 6000 m/s, i.e., without any a 
priori knowledge about the true models. Such true and initial velocity models are chosen so that 
observed data do not contain caustics. Moreover, they are intended to result in large discrepancies 
in traveltime between predicted data and observed data. 

Two test configurations are used: crosshole and nearly-complete. The data type "crosshole" 
means that the sources are located near the left boundary edge while the receivers are on the 
right side of the numerical domain. This data type is used in example 1. In order to separate 
the ill-conditioning issue from the non-convexity issue, the second and third examples assume the 
availability of "nearly-complete" data. In this case, each shot consists of a source placed on one of 
the four boundary edges, and a full set of receivers on the other three edges. 

We compare convergence of the RGLS method quantitatively with that of the LS method for 
each case from HI to R3 listed in Table [ij We plot 1) true vs. converged velocity models, 2) data 
misfit vs. iteration count for both LS and RGLS, and 3) rms values of Vr — T4, where Vr is the 
true velocity model and Vk is the k^^ step velocity model. By data misfit, we simply mean the 
least-squares misfit ([l]). 

4.2 Inversion example 1: crosshole tests, cases HI and LI 

Example 1 compares the performance of RGLS and LS optimization using the velocity models Vh 
and Vl with limited data availability. The experiment configuration for this example is similar to 
that of a crosshole test in the field; each shot has a source located near the left boundary (x = 10) 
and 499 equally spaced receivers near the right boundary (x = 2490,5 < z < 2495). The total 
number of shots is 49. Due to the limited data and aperture, it is expected that parts of the domain 
are not well resolved even with the RGLS method: the inverse problem is inherently ill-posed. 

For the HI case (top row in Figure [s]), the RGLS method correctly updates the model velocity 
by increasing it near the center while the LS method decreases it to below 4500 m/s. The LS 
method ends up reducing the background velocity instead of increasing it. We understand this 
behavior as follows: in its drive to minimize the data misfit in the fastest possible way, the LS 
method finds it more favorable to put to zero predicted data rather than to make it fit the observed 
data. The problem is not cycle-skipping; in this case it is simply that the time supports of the 
predicted and observed data are disjoint. 

The two strong reflectors along the diagonals are not fully understood and deserve more in- 
vestigation. They seem to be related with the limited aperture of a crosshole test and consequent 
ill-conditioning. Such reflectors do not appear in the next examples with source-receiver configu- 
rations involving nearly-complete data. 

For the LI case with low velocity lens reference model, both methods seem to work to some 
degree, as shown in Figure [8} bottom row. The converged velocity model of the RGLS method is 
still much closer to the reference model than that of the LS method. The converged model of the 
LS method is shown in Figure |8) bottom right: it is technically closer to the reference model than 
the initial model is, though the iterations visibly led to a spurious model (local minimum). The 
LS method suffers from the same pathological behavior of putting to zero the predicted data, as 
observed in the previous example. 

For the HI case, LS optimization decreases the data misfit J[m/e], but increases the model rms 
error gradually, as shown in Figure |9] (top). In contrast, the RGLS method temporarily increases 
the data misfit, but decreases the model rms error, as the predicted data correctly move towards 
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Figure 8: Inversion example 1: crosshole transmission cases HI and LI. Plots of velocity models: 
(left) reference (true) models, (center) converged models of RGLS optimization, (right) converged 
models of LS optimization. Top and bottom rows correspond to high and low- velocity lens models, 
cases HI and LI, respectively. 

the observed data during the first 20 to 30 iterations. For the LI case, both RGLS and LS methods 
decrease model and data misfit errors, as shown in the bottom row of Figure [9j The RGLS 
optimization converges much faster than LS optimization, and converges to the correct minimum 
while LS gets stuck in a spurious minimum. 

4.3 Inversion example 2: nearly-complete data, cases H2 and L2 

In the second example, we add more data by considering sources on all four sides of the domain. 
For any source on a given side, we consider a fine sampling of 750 receivers on the other 3 sides. 
This is equivalent to a wide-aperture configuration which reduces the ill-conditioning. The total 
number of shots used for computing the gradient update is 196. Enhancing data availability has a 
huge effect on the quality of converged velocity models of the RGLS method. The set of true and 
initial velocity models is unchanged from the previous inversion example. 

For both the H2 and L2 cases, the updates of the RGLS method (100 iterations) are free of 
artifacts along the diagonals and the velocity models look very close to the reference models. See 



Figure 10 (center). 

The LS method behaves differently in the H2 case than in the L2 case: the method converges to 
a wrong model in the former case, and to the correct model in the latter case (though after a very 
large number of iterations). For the H2 case with a high velocity zone, LS optimization decreases 
the velocity model, hence updates the velocity model in the wrong direction. We attribute this 
behavior to the fact that the LS method seeks to make the predicted waves arrive later and with 
smaller amplitudes, hence again trying to put the prediction to zero rather than matching it to 
observed data. As a result, the model velocity reaches around 3500 m/s at the center, as shown 
in Figure [To] (top right), which is much lower than the initial velocity 5100 m/s. (However, notice 
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Figure 9: Inversion example 1: convergence of model rms error Vk — Vtme (left) and data misfit J 
(right). The top and bottom rows respectively correspond to the high- velocity HI and low- velocity 
LI lens models. 
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Figure 10: Inversion example 2: plots of velocity models: (left) reference, (center) converged 
model of RGLS optimization, (right) converged model of LS optimization. The top and bottom 
rows respectively correspond to the high-velocity and low-velocity lens models, i.e. cases H2 and 
L2, respectively. 



that the four corners seem to be updated properly. There is no cycle-skipping there: predicted 
data with short travel times are within a quarter of a wave length from corresponding observed 
data around the corners. ) 

In the L2 case, the LS method updates correctly parts of the domain near the boundary as 
well as the corners, as shown in Figure [Tol (bottom right). Interestingly, a hundred more iterations 
result in a much better velocity model with large errors only at the center. The area of the error 
zone at the center gets smaller as the iterations proceed. We understand this behavior as due to 
the fact that LS minimization whittles away at the center of the image by progressively matching 
data from the corners inward. 

For the LS method in the H2 case, the data misfit decreases but the model velocity error 
increases gradually, as shown in Figure [IT] (top). RGLS optimization correctly updates the velocity 
model, decreasing the model (velocity) error, and "flies above the nonconvex landscape" of the LS 
misfit cost function. As in the previous example, a hump is observed in the Lp' data misfit. In the 
L2 case, both RGLS and LS converge, reducing both data and model errors. However, RGLS is 
much faster. LS updates the model in a slow, piecemeal way from the boundary inwards. 



4.4 Inversion example 3: nearly-complete data, case R3 

Finally, we test the RGLS method with a more complicated reference velocity model shown in 
Figure [l2| (left). The configuration of sources and receivers is the same as in inversion example 2; 
sources on all sides and receivers on the 3 opposite sides. The medium-scale details of the model 
are successfully recovered by RGLS optimization. RGLS optimization stalls the data misfit after 63 
iterations: the inversion then switches from RGLS to LS. (Note that the LS method is close to the 
special case a — I'm the construction of fractionally warped data, hence the late-game switch to LS 
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Figure 11: Inversion example 2: convergence of model (velocity) rms error Vk — Vtme (left) and 
data misfit J (right) The top and bottom rows respectively correspond to the high-velocity H2 and 
low- velocity L2 lens models. 
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Figure 12: Inversion example 3: Plots of velocity models of (left) true model, (center) of converged 
model of RGLS optimization, and (right) of converged model of LS optimization. 
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Figure 13: Inversion example 3: convergence of model (velocity) rms error Vk 
data misfit J (right) 
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is more of a parameter adjustment than an ad- hoc fix.) Switching to LS is safe because observed 
data are now within a fraction of a wavelength of predicted data. Using a velocity model with 
stronger randomness would make the RGLS method fail, because the observed data would contain 
many refracted waves that could not be matched to the simple one-wave prediction by warping. 

5 Discussion 

The proposed method has a limitation; seismogram registration requires the predicted and observed 
traces to be comparable. When the observed and predicted seismograms differ by substantially more 
than a warping, the registration-based method may not make sense at all. This issue often occurs 
with reflection data; it is typically difficult to create interfaces in the model that generate the right 
number of reflected waves in the first iteration. The transmission case is much more favorable 
as it often generates a prediction with the same number of arrivals as the data, albeit with large 
traveltime delays. 

6 Conclusions 

We present registration guided least-squares (RGLS) as a way to overcome the cycle-skipping 
problem in FWI, thereby extending the basin of attraction to the global minimizer. The successful 
application of the RGLS method to seismic inversion problems is demonstrated in the transmission 
setting, where the conventional LS method often converges to a wrong model. The proposed 
method substitutes a transported version of the prediction, referred to as fractionally warped 
data, for the observed data in the conventional least-squares misfit residual. In order to generate 
transported data, mappings in the form of piecewise polynomials are found through a non-convex 
optimization formulation. The non-convex optimization problem is tackled in a multiscale manner 
similar to frequency sweep/continuation in frequency domain FWI. In order to create the low 
frequencies which may be absent in data, low-frequency augmented (LFA) signals are proposed 
and demonstrated to provide a satisfying alternative to the raw seismograms for the registration 
step. A method using the envelope property of the Hilbert transform is proposed for this LFA 
transformation. Three inversion examples using seismogram registration and the RGLS method 
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show that the proposed method decreases model errors monotonicahy while it allows the data misfit 
to increase temporarily prior to eventual convergence. 
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